28 research outputs found

    Predicting gene ontology from a global meta-analysis of 1-color microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance.</p> <p>Results</p> <p>13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision.</p> <p>Conclusions</p> <p>Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.</p

    Systematic classification of non-coding RNAs by epigenomic similarity

    Get PDF
    BACKGROUND: Even though only 1.5% of the human genome is translated into proteins, recent reports indicate that most of it is transcribed into non-coding RNAs (ncRNAs), which are becoming the subject of increased scientific interest. We hypothesized that examining how different classes of ncRNAs co-localized with annotated epigenomic elements could help understand the functions, regulatory mechanisms, and relationships among ncRNA families. RESULTS: We examined 15 different ncRNA classes for statistically significant genomic co-localizations with cell type-specific chromatin segmentation states, transcription factor binding sites (TFBSs), and histone modification marks using GenomeRunner (http://www.genomerunner.org). P-values were obtained using a Chi-square test and corrected for multiple testing using the Benjamini-Hochberg procedure. We clustered and visualized the ncRNA classes by the strength of their statistical enrichments and depletions. We found piwi-interacting RNAs (piRNAs) to be depleted in regions containing activating histone modification marks, such as H3K4 mono-, di- and trimethylation, H3K27 acetylation, as well as certain TFBSs. piRNAs were further depleted in active promoters, weak transcription, and transcription elongation regions, and enriched in repressed and heterochromatic regions. Conversely, transfer RNAs (tRNAs) were depleted in heterochromatin regions and strongly enriched in regions containing activating H3K4 di- and trimethylation marks, H2az histone variant, and a variety of TFBSs. Interestingly, regions containing CTCF insulator protein binding sites were associated with tRNAs. tRNAs were also enriched in the active, weak and poised promoters and, surprisingly, in regions with repetitive/copy number variations. CONCLUSIONS: Searching for statistically significant associations between ncRNA classes and epigenomic elements permits detection of potential functional and/or regulatory relationships among ncRNA classes, and suggests cell type-specific biological roles of ncRNAs

    Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data

    Get PDF
    Background Adapter trimming and removal of duplicate reads are common practices in next-generation sequencing pipelines. Sequencing reads ambiguously mapped to repetitive and low complexity regions can also be problematic for accurate assessment of the biological signal, yet their impact on sequencing data has not received much attention. We investigate how trimming the adapters, removing duplicates, and filtering out reads overlapping low complexity regions influence the significance of biological signal in RNA- and ChIP-seq experiments. Methods We assessed the effect of data processing steps on the alignment statistics and the functional enrichment analysis results of RNA- and ChIP-seq data. We compared differentially processed RNA-seq data with matching microarray data on the same patient samples to determine whether changes in pre-processing improved correlation between the two. We have developed a simple tool to remove low complexity regions, RepeatSoaker, available at https://github.com/mdozmorov/RepeatSoaker, and tested its effect on the alignment statistics and the results of the enrichment analyses. Results Both adapter trimming and duplicate removal moderately improved the strength of biological signals in RNA-seq and ChIP-seq data. Aggressive filtering of reads overlapping with low complexity regions, as defined by RepeatMasker, further improved the strength of biological signals, and the correlation between RNA-seq and microarray gene expression data. Conclusions Adapter trimming and duplicates removal, coupled with filtering out reads overlapping low complexity regions, is shown to increase the quality and reliability of detecting biological signals in RNA-seq and ChIP-seq data

    Nicotinamide mononucleotide (NMN) supplementation promotes anti-aging miRNA expression profile in the aorta of aged mice, predicting epigenetic rejuvenation and anti-atherogenic effects

    Get PDF
    Understanding molecular mechanisms involved in vascular aging is essential to develop novel interventional strategies for treatment and prevention of age-related vascular pathologies. Recent studies provide critical evidence that vascular aging is characterized by NAD+ depletion. Importantly, in aged mice, restoration of cellular NAD+ levels by treatment with the NAD+ booster nicotinamide mononucleotide (NMN) exerts significant vasoprotective effects, improving endothelium-dependent vasodilation, attenuating oxidative stress, and rescuing age-related changes in gene expression. Strong experimental evidence shows that dysregulation of microRNAs (miRNAs) has a role in vascular aging. The present study was designed to test the hypothesis that age-related NAD+ depletion is causally linked to dysregulation of vascular miRNA expression. A corollary hypothesis is that functional vascular rejuvenation in NMN-treated aged mice is also associated with restoration of a youthful vascular miRNA expression profile. To test these hypotheses, aged (24- month-old) mice were treated with NMN for 2 weeks and miRNA signatures in the aortas were compared to those in aortas obtained from untreated young and aged control mice. We found that protective effects of NMN treatment on vascular function are associated with anti-aging changes in the miRNA expression profile in the aged mouse aorta. The predicted regulatory effects of NMN-induced differentially expressed miRNAs in aged vessels include anti-atherogenic effects and epigenetic rejuvenation. Future studies will uncover the mechanistic role of miRNA gene expression regulatory networks in the anti-aging effects of NAD+ booster treatments and determine the links between miRNAs regulated by NMN and sirtuin activators and miRNAs known to act in the conserved pathways of aging and major aging-related vascular diseases

    MNEMONIC: MetageNomic Experiment Mining to create an OTU Network of Inhabitant Correlations

    No full text
    Abstract Background The number of publicly available metagenomic experiments in various environments has been rapidly growing, empowering the potential to identify similar shifts in species abundance between different experiments. This could be a potentially powerful way to interpret new experiments, by identifying common themes and causes behind changes in species abundance. Results We propose a novel framework for comparing microbial shifts between conditions. Using data from one of the largest human metagenome projects to date, the American Gut Project (AGP), we obtain differential abundance vectors for microbes using experimental condition information provided with the AGP metadata, such as patient age, dietary habits, or health status. We show it can be used to identify similar and opposing shifts in microbial species, and infer putative interactions between microbes. Our results show that groups of shifts with similar effects on microbiome can be identified and that similar dietary interventions display similar microbial abundance shifts. Conclusions Without comparison to prior data, it is difficult for experimentalists to know if their observed changes in species abundance have been observed by others, both in their conditions and in others they would never consider comparable. Yet, this can be a very important contextual factor in interpreting the significance of a shift. We’ve proposed and tested an algorithmic solution to this problem, which also allows for comparing the metagenomic signature shifts between conditions in the existing body of data
    corecore